Pneumonia detection with QCSA network on chest X-ray

Worldwide, pneumonia is the leading cause of infant mortality. Experienced radiologists use chest X-rays to diagnose pneumonia and other respiratory diseases. The diagnostic procedure's complexity causes radiologists to disagree with the decision. Early diagnosis is the only feasible strategy for mitigating the disease's impact on the patent. Computer-aided diagnostics improve the accuracy of diagnosis. Recent studies established that Quaternion neural networks classify and predict better than real-valued neural networks, especially when dealing with multi-dimensional or multi-channel input. The attention mechanism has been derived from the human brain's visual and cognitive ability in which it focuses on some portion of the image and ignores the rest portion of the image. The attention mechanism maximizes the usage of the image's relevant aspects, hence boosting classification accuracy. In the current work, we propose a QCSA network (Quaternion Channel-Spatial Attention Network) by combining the spatial and channel attention mechanism with Quaternion residual network to classify chest X-Ray images for Pneumonia detection. We used a Kaggle X-ray dataset. The suggested architecture achieved 94.53% accuracy and 0.89 AUC. We have also shown that performance improves by integrating the attention mechanism in QCNN. Our results indicate that our approach to detecting pneumonia is promising.

The attention mechanism 4,5 has attracted significant interest in computer vision systems for object recognition and scene interpretation in the past few years. By focusing only on the relevant parts of an object, the human visual system enables people to discriminate objects quickly when it is viewed. The capability of the human brain has inspired the use of attention mechanisms in deep neural networks 6 . The attention mechanism was utilized more frequently in activities linked to natural language processing 7 . It has also lately been used for image classification tasks 8 to produce cutting-edge outcomes. Channel and spatial attention processes 9 are the two typical types of attention mechanisms applied in computer vision tasks.
Recently, researchers [10][11][12][13] have experimented with the quaternion extension of CNN and produced outperforming results compared to real-valued CNN. In this experiment, the channel and spatial attention modules of a quaternion residual quaternion network were employed to improve the performance of predicting Pneumonia from CXR images. The capability of quaternions to adequately describe spatial transformations and evaluate multi-channel data makes them an intriguing candidate for computer vision applications.
The novelty of the proposed work is that we have incorporated a spatial attention layer in between the layers of Quaternion convolutional neural network. This enables the network to learn important regions from the chest X-ray images while attending to complex spatial features, thereby improving the accuracy of pneumonia detection. Our analysis of the feature map and the attention map shows that the QCSA network is able to effectively learn features from important regions of the chest X-ray images which leads to better performance of the proposed framework to detect pneumonia.

Major contributions.
The following is our contribution to this experiment. 1. We have first built residual quaternion architecture and evaluated the performance of Pneumonia detection on the CXR dataset. 2. We then incorporated spatial and channel attention modules in the architecture in (i) and kept all hyperparameter values the same for both architectures. We then evaluated the performance of this attentionaugmented architecture. 3. We then compared the performance of both the architecture to compute the influence of incorporating spatial and channel attention modules.
The remainder of this work is structured as follows. The background necessary for the proposed work and recent research results in issue areas connected to the proposed work is presented in "Background and similar works" section. "Materials and methods" section outlined the properties of the utilized dataset and recommended design. "Experimental analysis" section presents hardware infrastructure, performance metrics, and experimentation details and "Analysis of result" section describes a discussion of results. The conclusion and future scope of the proposed work is given in "Conclusion and future work" section.

Background and similar works
Here, we give the necessary background ideas for the suggested design as well as a comparative study of the findings of other recent investigations related to the same problem area. QCNN 14 is an extension of the real CNN model.

Quaternion convolution neural network (QCNN).
Quaternion is a four-dimensional vector space having a basis of 1, i, j, and k. One of these orthogonal subspaces is a scalar subspace of one dimension, whereas the other is a pure subspace of three dimensions. Quaternion neural networks are a more recent form of neural network that uses quaternion-valued inputs, activation, and parameters (QNNs). Quaternions are numbers with one real component and three imaginary components. Each of its three imaginary components may encode a color component of an RGB image, making them appropriate for image processing. Numerous proposed models outperform their real-valued equivalents in tasks such  www.nature.com/scientificreports/ as image processing and speech recognition 15,16 in recent years. Moreover, quaternion-valued networks benefit from parameter sharing as a result of the interactions of the Hamilton product 17 , resulting in models that require fewer parameters and less storage space and are hence smaller. These benefits can be provided to representations by substituting quaternionic layers for conventional (real-valued) layers, hence lowering their size without a perceptible decrease in performance. Inputs and layers of a QNN have quaternion values as opposed to real values. Although the work on quaternion representations for deep learning is in its infancy, few papers analyzing their value have been published. Deep quaternion networks have been used specifically for classification 18,19 and segmentation 20 . According to their research, quaternions offer superior results for a variety of tasks while necessitating fewer parameters. QCNNs were developed in order to correctly display color images in the quaternion domain. They found that their QCNN models for color image classification 21 and denoising outperformed traditional CNNs. The authors of 22 studied the influence of the Hamilton product on the grayscale-only reconstruction of color images. To reconstruct a unique grayscale image, a quaternion convolutional encoder-decoder architecture is created in 12 . In contrast to standard convolutional encoder-decoder networks, their method can efficiently learn to reconstruct an image's colors from its grayscale representation. They conclude that quaternion-valued systems are unfettered by internal and global dependencies, making them suited for applications involving image recognition. Quaternion Recurrent Neural Networks (QRNNs) are proposed by the same authors 23 for sequential tasks such as speech recognition. Their quaternion-based recurrent designs beat non-quaternion-based alternatives despite having two to three times fewer parameters. Figure 2, shows the building blocks, which show the customization of conventional CNN into quaternion CNN.
Algebra of quaternion numbers. This section describes identities and properties 24 , followed by quaternion numbers.
Following Eq. (1) is the notation for a quaternion Q.
Furthermore, the Imaginary components of Quaternion can be expressed by Eq. (2).
As seen by the following Eq. (3), the product of two quaternions violates the commutative property.
Also, in the quaternion domain, r represents the scalar component, x, y, and z represent the imaginary component in xi + yj + zk, and v represents the vector component. It has been represented by Eq. (4).
The conjugate of Q is denoted by Eq. (5).
The inverse Q −1 of a quaternion Q is defined by the expression as given in Eq. (7).
Just like a complex number, a quaternion number can also be represented as in Eq. (8).
ρ =|Q|, θ is a real quantity and s is a pure imaginary quaternion of unit length. Rotate a three-dimensional vector Q by an angle along a rotation axis w to obtain a new vector p. This rotation may be shown in Eqs. (9) and (10). p = w · Q · w wherep andQ are pure Quaternion with the real component being zero The Quaternion convolution method employs scaling and rotation between the Q and Q N input filters. Here, w is a quaternion filter of size F, and Q is a quaternion matrix of size N. Then, as in Eq. (11), the quaternion operation can be written as. S = N − F + 1 and T = N − F + 1 Here, s stands for the scaling component is the axis of unit length, and fluctuates between-and. Due to the Hamiltonian product, as indicated in Eq. (11), A QNN can represent the local and global dependence inside the multi-channel input's features.
Hamiltonian product. In QCNN, the Hamilton product is utilized in place of the conventional real-valued dot product to carry out the following transformations between two quaternions, Q 1 = r 1 + x 1 i + y 1 j + z 1 k and W 1 = r 2 + x 2 i + y 2 j + z 2 k, here Q 1 and W 1 are two quaternions. ⊗ operator is used to represent the Hamiltonian product of two quaternions Q 1 and W 1, and it is defined as Eq. (12).
The Hamilton product enables QNN to discover latent interactions inside the Quaternion's properties. During the Hamilton product in a QNN, the quaternion-weight components are shared over many quaternion-input sections, hence forming connections between the elements. In a real-valued neural network, the multiple weights necessary to encode latent relations within a feature are evaluated at the same level as learning global dependencies between different features, while the quaternion weight w encodes these interconnections within a unique quaternion Q out during the Hamilton product.
Attention mechanism. Image attention involves finding a target region as the eye rapidly scans the image.
When smaller activation values are combined by the associated feature map, a substantial quantity of feature map information is discarded.; therefore, combining spatial and channel attention in the quaternion-residual network produces superior results. Second, regions of interest are highlighted as opposed to feature maps. When channel attention reduces the information in individual feature maps, spatial attention can highlight numerous significant regions of each feature map by employing the attention mask of a different branch. In the last phase, the output feature maps of two attention processes are concatenated. These characteristics of interest are amplified in fused feature maps, while redundant features are deleted. To collect the most accurate target data while reducing unnecessary data, this target region is weighted (distributed). Soft attention 25,26 is the most popular since it is differentiable and trains CNN models from start to end. Most soft attention models employ an attention template to locate distinctive aspects for aligning the weights of discrete sequences or image segments. Hard attention, as opposed to soft attention, is a stochastic, non-differentiable procedure that analyzes distinct regions as opposed to the image's primary characteristics. The attention network for image classification can determine www.nature.com/scientificreports/ an image's attention spectrum's weight of the arithmetic mean of attention. The method can gather image-based attention like natural language processing. Because it collects features from data, a deep neural network can classify images pixel-wise. The attention mechanism 27 mimics human vision and helps identify significant characteristics quickly and precisely. CNN process all image information and details in all convolution layers. Multiple convolution layers and global average pooling in the last layer average the image's characteristics and attributes. This network's last affine fully connected layer determines image classification. Background and other non-essential information have a greater impact on categorization results as image size decreases. Large quantities of data plus a neural network that learns not to emit background information prevent outcomes from being inaccurate.
One way to generate one image from two or more convolution layers is to branch the output of one layer. We set sigmoid, the convolution output activation function, to work a value between zero and one for each pixel. Sigmoid keeps input values within the range of 0 to 1. The result of the convolution function multiplies the initial output. The two further layers assess the output's quantity. Near-zero values are unimportant. This configuration discards most sigmoid values approaching zero from the downstream recognition process. Configuring a neural network to estimate the area of focus using the result is the most common way to use attention for image classification.
Literature 27 has produced two visual-system-inspired attention strategies. The first is a top-down method that iteratively selects the correct region from a scene record pool. The bottom-up approach, however, highlights the most critical visual path places. Top-down iteration is slower than bottom-up. The bottom-up technique selects the most relevant regions from incoming data progressively, although sequential processes increase errors with depth.
The attention mechanism is a prominent study topic for many reasons. Any model's attention mechanism outperforms baseline techniques. Second, using backpropagation, the attention model can be trained with a base recurrent neural network. The transformer model's 28 induction was widely used in image processing, video processing, and recommendation systems, improving the attention model and avoiding the parallelized issue in recurrent neural networks.
Classification neural networks model data as a numeric vector of low-level features with the same weights against their capabilities. The attention model assigned variables to features based on their relevance. The attention model computes the weight distribution based on the input features and assigns greater values to features with high rank.
The attention mechanism has three layers: alignment, attention weight, and context vector. The attention layer calculates the alignment score between the encoded vector h = {h 1 , h 2 ,….. h n ) and a vector v. As stated in Eqs. (13) and (14), the SoftMax computes the probability distribution α 1 by normalizing over all n elements of h where i = 1, 2,…n.
From the equations above, hi provides vector v with vital information. The attention mechanism output O is a weighted sum of the encoded vector hi.
In the proposed work, we have combined channel attention and spatial attention mechanism in quaternion residual networks.
Channel attention. Using the inter-channel relationship between features, a channel attention 26,29,30 map is created. As each channel of a feature map is seen as a feature detector, the channel focuses on global features. It reduces the spatial dimension of the input feature map in order to appropriately compute channel attention. The channel attention method generates a sigmoid-activated one-dimensional (1-D) tensor for specified feature maps. In a few channel axes of feature maps, it is anticipated that some activation values of the 1-D tensor will be larger than the corresponding feature maps of interest, but others will be smaller so as to prevent the repetition of feature maps. We generate two spatial context descriptors, F Avg c and F max c , which stand for average-pooled features and max-pooled features, respectively. Spatial attention. On the basis of the interstitial interaction between features, a spatial attention map is generated. In contrast to channel attention, which focuses on a channel's location, the spatial attention module emphasizes the location of an important feature. To compute spatial attention, we first apply the average-pooling and maximum-pooling processes along the channel axis, then concatenate the results to provide a useful feature descriptor. The concatenated feature descriptor is used in conjunction with a convolution layer to build a spatial attention map that encodes where to highlight or suppress. Figure 3 shows how we placed channel attention and spatial attention blocks inside the building block of QCNN. These spatial and channel blocks were compatible with quaternion inputs. Adding channel and spatial attention blocks do not increase learnable parameters and hence does not give computational cost.
Comparison of recent related studies. Pneumonia detection via CXR has been an unresolved issue for many years, with the lack of publicly available data constituting the primary limitation. Extensive research has

Material and methods
Dataset. The dataset 42 (https:// www. kaggle. com/ datas ets/ pault imoth ymoon ey/ chest-xray-pneum onia) is organized into the train, test, and validation directory, with a subdirectory for each image type (Pneumonia/Normal) within each directory. There are 5,856 CXR images in JPEG format, split into two categories (P/N). The CXR images of one-to five-year-old infants at the Guangzhou Women and Children's Medical  www.nature.com/scientificreports/ Center were chosen retrospectively from cohorts. CXRs were frequently taken as part of the patient's therapy. Before the images could be used to train an AI system, two expert physicians reviewed them. A third expert evaluated the assessment set more thoroughly to account for any potential grading problems. The training set comprised 5136 images; however, the test set only has 700. Table 2 displays the datasets for each classification. Table 3 demonstrates that 75% of the dataset has been allocated to the training set, 80% to the test set, and 20% to the validation set.
Proposed framework. The proposed method comprises image preprocessing with an image enhancement technique and image resizing, dataset imbalance handling, augmentation of training images, the transformation of input images into the quaternion domain, training on a Quaternion residual network with spatial and channel attention modules, and evaluation of Pneumonia classification with the proposed model. Figure 6 depicts our suggested design, which augments the structure of quaternion residual network architecture with channel and spatial attention modules.
Data preprocessing. In preparation for image normalization, the photos are converted into an array and sorted by 255. It allows the scale of an image to be specified between 0.0 and 1.0. It helps each image by removing abnormalities caused by shadows and illumination.
Image enhancement. Image quality affects the performance, and we performed it also to maintain uniformity in the entire dataset input images.
Data augmentation. By applying various types of transformation on input images, challenges of smaller dataset size is rectified.
Dataset balancing. It is done to maintain a balance between the input data size of all dataset classes.
Training of proposed architecture. The preprocessed dataset is projected in quaternion space and trained on the QCSA network.
Evaluation of performance. Trained model is then tested on unseen images to evaluate its performance. Figure 4 diagrammatically shows the steps carried out in our experiment, which include preprocessing steps on the selected dataset, design of proposed architecture, training of model on the preprocessed dataset, followed by testing of evaluation of the performance of proposed architecture.
Spatial and channel attention modules focus only on the crucial part of the input and extract features from them only. Figure 5 shows the relative positioning of spatial and channel attention blocks in the proposed architecture. Figure 6 displays the design of the proposed architecture, which shows the detailed structure of the proposed model. In this, we have employed four quaternion residual blocks with attention blocks.

Experimental analysis
Implementation details and hyper-parameter settings. To showcase our proposed architecture, we experimented with one of the most commonly downloaded datasets for testing on Kaggle, a benchmark dataset of CXR images. Utilizing these research and datasets for binary categorization. Python 3.7, Anaconda/3, and CUDA/10 are installed on a Windows server with an i5 CPU, 2 GB GPU, and 8 GB RAM. In addition to the aforementioned parameters, the Python libraries Tensorflow-Keras, OpenCV, matplotlib, os, math, and NumPy are employed. As shown in Table 4, we have trained the system for 40 epochs using hyperparameters. Table 2. Class wise distribution of thedataset.

Pneumonia (P) 4273
Normal (N) 1583 Table 3. Train, test, and validation dataset partitioning.    F1-score. The F1-score combines precision and recall as a measurement. Typically, it is stated as a harmonic mean of precision and recall.

# of images # of images from P class # of images from N class
Sensitivity. It is a test's capacity to appropriately detect diseased patients. It is the same as recall.
Specificity. It is a test's ability to correctly identify healthy individuals.
Receiver operator characteristic (ROC). This curve displays the variations of sensitivity with respect to a (1-specificity). It is used to demonstrate the relationship between sensitivity and specificity.  Performance evaluation of the proposed methodology. In our experiment, we evaluated the performance of Pneumonia prediction on two architectures: (i) QCNN without Attention blocks and (ii) QCNN with spatial and channel attention blocks. The same set of hyper-parameters values as in Table 4 and the dataset in Table 2 has been used to make a comparative analysis. Table 5 presents the performance of both architectures. As in Table 5, we observed a rise of 4% in classification accuracy when attention modules are augmented in the QCNN architecture.

Analysis of result
The ultimate goal of Pneumonia detection using deep learning is to minimize false positive and negative cases, as they can have significant consequences for patient care. False positives can lead to unnecessary treatments, which can be costly and potentially harmful to the patient, while false negatives can result in delayed diagnosis and treatment, which can be life-threatening. Therefore, in the context of pneumonia detection, it is more important to prioritize accuracy over training and prediction time. Whave performed the experiment with this dataset applying different deep learning architectures which is presented by Table 6 with performance metrics such as accuracy, f1-score, number of trainable parameters, and non-trainable parameters. We have presented the accuracy of models by bar graph in Fig. 18, which shows that the proposed method performs better while capturing the complex features and attending the important region of an image.

Conclusion and future work
In this research, we provide a system in which deep learning architectures are adapted to the quaternion domain, and it is augmented with attention modules that consist of channel attention and spatial attention modules to focus only on more relevant portions of the image. Quaternion-customized deep neural network architecture shows better classification performance, especially of multi-channel data, because of the real-valued conventional   www.nature.com/scientificreports/ DNN they handle. This architecture was evaluated on a public dataset on Kaggle of CXR images for the detection of pneumonia. We customized the residual network in the quaternion domain. We first evaluated the residual quaternion network on the dataset, and it gave a test accuracy of 90.27%, which is better than real-valued residual CNN architecture. We evaluated quaternion residual network architecture augmented with spatial and channel attention modules, which gave an accuracy of 94.53%. We observed a 4% rise in accuracy in the experiment when the attention mechanism is integrated with Quaternion residual network. The proposed model displays generalization potential when evaluated on distinct data sets. If the proposed architecture is ensembled with Figure 11. F-1 score curve.